Skip to content

chore: improve consistency across GPU CI workflows#160

Merged
dims merged 1 commit intoNVIDIA:mainfrom
dims:improve-gpu-ci-workflows
Feb 20, 2026
Merged

chore: improve consistency across GPU CI workflows#160
dims merged 1 commit intoNVIDIA:mainfrom
dims:improve-gpu-ci-workflows

Conversation

@dims
Copy link
Copy Markdown
Collaborator

@dims dims commented Feb 20, 2026

Summary

  • Add eidos snapshot + validate steps to training workflow (parity with inference)
  • Add conformance evidence collection to training workflow
  • Stagger cron schedules to reduce runner contention (T4 :00, inference :15, training :30)
  • Normalize branch filter to pull-request/[0-9]+ in T4 smoke test (was pull-request/*)
  • Pass unique artifact_name_prefix to gpu-test-cleanup from each H100 workflow
  • Generalize gpu-test-cleanup: replace hardcoded gpu-smoke-test pod reference with non-running pods listing
  • Add CNCF AI conformance validations to inference workflow:

Test plan

  • Verify training workflow runs snapshot/validate successfully on H100 x2 runner
  • Verify conformance evidence collection runs in training workflow
  • Verify inference gateway validation passes (GatewayClass + Gateway)
  • Verify DCGM metrics are scraped by Prometheus and custom metrics API is available
  • Verify secure accelerator access checks pass (DRA-only, no hostPath)
  • Verify T4 smoke test is unaffected

@dims dims requested a review from a team as a code owner February 20, 2026 00:49
- Add eidos snapshot + validate to training workflow
- Add conformance evidence collection to training workflow
- Stagger cron schedules (T4 :00, inference :15, training :30)
- Normalize branch filter to pull-request/[0-9]+ in T4 smoke test
- Pass unique artifact_name_prefix to gpu-test-cleanup from each workflow
- Generalize gpu-test-cleanup: replace hardcoded pod reference with
  non-running pods listing

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
@dims dims force-pushed the improve-gpu-ci-workflows branch from 383946c to 18e1f46 Compare February 20, 2026 01:29
@dims dims merged commit 69c37d0 into NVIDIA:main Feb 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant